AITopics | kullback-leibler divergence

Collaborating Authors

kullback-leibler divergence

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Memorisation, convergence and generalisation in generative models

Maillard, Antoine, Goldt, Sebastian

arXiv.org Machine LearningMay-21-2026

Generative neural networks learn how to produce highly realistic images from a large, but finite number of examples - or do they simply memorise their training set? To settle this question, Kadkhodaie, Guth, Simoncelli and Mallat (ICLR '24) trained diffusion models independently on disjoint subsets of a dataset and showed that they converge to nearly the same density when the number of training images is large enough. This result raises two basic questions: how much data do you need for convergence, and what does convergence capture about learning the data distribution? Here, we address these questions by providing an exact analytical characterisation of the transition from memorisation to generalisation in linear generative models. We find that these models memorise at small load, while convergence emerges continuously when the number of samples is linear in the input dimension. Strikingly, we find that convergence is insensitive to recovery of the principal latent factors of the data, which are recovered in a sharp transition. After extending our approach to data with power-law spectra, we find the same distinction between convergence and latent recovery in our experiments with convolutional denoisers and in the data of Kadkhodaie et al. We thus show that generalisation in generative models decomposes into at least two distinct objectives: matching the bulk of the data distribution and recovering the principal latent factors. These objectives correspond to two different distances between true and learnt data distribution, and only the first one is captured by convergence.

artificial intelligence, cit, machine learning, (18 more...)

arXiv.org Machine Learning

2605.21402

Country: Europe (0.46)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

159f7fe5b51ecd663b85337e8e28ce65-Supplemental-Conference.pdf

Neural Information Processing SystemsMay-1-2026, 01:42:34 GMT

artificial intelligence, machine learning, nre, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation

Neural Information Processing SystemsMar-21-2026, 05:05:37 GMT

Since pioneering work of Hinton et al., knowledge distillation based on Kullback-Leibler Divergence (KL-Div) has been predominant, and recently its variants have achieved compelling performance. However, KL-Div only compares probabilities of the corresponding category between the teacher and student while lacking a mechanism for cross-category comparison. Besides, KL-Div is problematic when applied to intermediate layers, as it cannot handle non-overlapping distributions and is unaware of geometry of the underlying manifold. To address these downsides, we propose a methodology of Wasserstein Distance (WD) based knowledge distillation. Specifically, we propose a logit distillation method called WKD-L based on discrete WD, which performs cross-category comparison of probabilities and thus can explicitly leverage rich interrelations among categories. Moreover, we introduce a feature distillation method called WKD-F, which uses a parametric method for modeling feature distributions and adopts continuous WD for transferring knowledge from intermediate layers. Comprehensive evaluations on image classification and object detection have shown (1) for logit distillation WKD-L outperforms very strong KL-Div variants; (2) for feature distillation WKD-F is superior to the KL-Div counterparts and state-of-the-art competitors.

artificial intelligence, machine learning, proceedings, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Supplementary Materials for " Multi-Agent Meta-Reinforcement Learning " AT echnical Lemmas

Neural Information Processing SystemsFeb-17-2026, 06:30:20 GMT

From the three-points identity of the Bregman divergence (Lemma 3.1 of [9]), KL (x y) KL ( x y) = KL (x x) + ln x ln y,x x (12) The first term in (12) can be bounded by KL (x x) = By the Hölder's inequality, the second term in (12) is bounded as ln x ln y,x x ln x ln y Lemma 5. Consider a block diagonal matrix We prove the lemma via induction on N . This completes the induction proof.Lemma 6. We introduce one more notation before presenting the proof. This leads us to the initialization-dependent convergence rate of Algorithm 1, which we re-state and prove as follows. In addition, if we initialize the players' policies to be uniform policies, i.e., The rest of the proof follows by putting all the aforementioned results together.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Learnability of In-Context Learning

Neural Information Processing SystemsFeb-14-2026, 10:53:48 GMT

Our theoretical analysis reveals that in this setting, in-context learning is more about identifying the task than about learning it, a result which is in line with a series of recent empirical findings.

in-context learning, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Bipartite Stochastic Block Models with Tiny Clusters

Stefan Neumann

Neural Information Processing SystemsFeb-14-2026, 02:26:37 GMT

Discovering clusters in bipartite graphs has been researched in many different settings. However, most of these algorithms were heuristics and do not provide theoretical guarantees for the quality oftheir results.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.15)
North America > Canada (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

e40b60677880e7e74f8a081f65703f0d-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 11:26:30 GMT

assignment, caption, mid, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

45f0d179ef7e10eb7366550cd4e574ae-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 16:06:49 GMT

discriminator, experiment, histogram, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

On the Properties of Kullback-Leibler Divergence Between Multivariate Gaussian Distributions

Neural Information Processing SystemsDec-26-2025, 14:29:24 GMT

Kullback-Leibler (KL) divergence is one of the most important measures to calculate the difference between probability distributions. In this paper, we theoretically study several properties of KL divergence between multivariate Gaussian distributions.

gaussian distribution, kullback-leibler divergence, mathcal, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)

Add feedback

Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence

Neural Information Processing SystemsDec-24-2025, 13:28:01 GMT

Existing rotated object detectors are mostly inherited from the horizontal detection paradigm, as the latter has evolved into a well-developed area. However, these detectors are difficult to perform prominently in high-precision detection due to the limitation of current regression loss design, especially for objects with large aspect ratios. Taking the perspective that horizontal detection is a special case for rotated object detection, in this paper, we are motivated to change the design of rotation regression loss from induction paradigm to deduction methodology, in terms of the relation between rotation and horizontal detection. We show that one essential challenge is how to modulate the coupled parameters in the rotation regression loss, as such the estimated parameters can influence to each other during the dynamic joint optimization, in an adaptive and synergetic way. Specifically, we first convert the rotated bounding box into a 2-D Gaussian distribution, and then calculate the Kullback-Leibler Divergence (KLD) between the Gaussian distributions as the regression loss.

kullback-leibler divergence, learning high-precision bounding box, rotated object detection, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.38)

Add feedback